Beta distribution

Not to be confused with Beta function.

Beta
Probability density function
Cumulative distribution function
parameters:	$\alpha > 0$ shape (real) $\beta > 0$ shape (real)
support:	$x \in (0; 1)\!$
pdf:	$\frac{x^{\alpha-1}(1-x)^{\beta-1}} {\mathrm{B}(\alpha,\beta)}\!$
cdf:	$I_x(\alpha,\beta)\!$
mean:	$\frac{\alpha}{\alpha+\beta}\!$
median:	$I_{0.5}^{-1}(\alpha,\beta)$ no closed form
mode:	$\frac{\alpha-1}{\alpha+\beta-2}\!$ for $\alpha>1, \beta>1$
variance:	$\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\!$
skewness:	$\frac{2\,(\beta-\alpha)\sqrt{\alpha+\beta+1}}{(\alpha+\beta+2)\sqrt{\alpha\beta}}$
ex.kurtosis:	see text
entropy:	see text
mgf:	$1 +\sum_{k=1}^{\infty} \left( \prod_{r=0}^{k-1} \frac{\alpha+r}{\alpha+\beta+r} \right) \frac{t^k}{k!}$
cf:	${}_1F_1(\alpha; \alpha+\beta; i\,t)\!$

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval (0, 1) parameterized by two positive shape parameters, typically denoted by α and β. It is the special case of the Dirichlet distribution with only two parameters. Just as the Dirichlet distribution is the conjugate prior of the multinomial distribution and categorical distribution, the beta distribution is the conjugate prior of the binomial distribution and bernoulli distribution. In Bayesian statistics, it can be seen as the likelihood of the parameter p of a binomial distribution from observing α − 1 independent events with probability p and β − 1 with probability 1 − p.

1 Characterization
- 1.1 Probability density function
- 1.2 Cumulative distribution function
2 Properties
- 2.1 Quantities of information
- 2.2 Shapes
3 Parameter estimation
4 Related distributions
5 Applications
6 Four parameters
7 References
8 External links

Characterization

Probability density function

The probability density function of the beta distribution is:

$\begin{align} f(x;\alpha,\beta) & = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{\int_0^1 u^{\alpha-1} (1-u)^{\beta-1}\, du} \\[6pt] & = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\, x^{\alpha-1}(1-x)^{\beta-1} \\[6pt] & = \frac{1}{\mathrm{B}(\alpha,\beta)}\, x ^{\alpha-1}(1-x)^{\beta-1} \end{align}$

where $\Gamma$ is the gamma function. The beta function, B, appears as a normalization constant to ensure that the total probability integrates to unity.

Cumulative distribution function

The cumulative distribution function is

$F(x;\alpha,\beta) = \frac{\mathrm{B}_x(\alpha,\beta)}{\mathrm{B}(\alpha,\beta)} = I_x(\alpha,\beta) \!$

where $\mathrm{B}_x(\alpha,\beta)$ is the incomplete beta function and $I_x(\alpha,\beta)$ is the regularized incomplete beta function.

Properties

The expected value ( $\mu$ ), first central moment, variance (second central moment), skewness (third central moment), and kurtosis excess (forth central moment) of a Beta distribution random variable X with parameters α and β are:

$\begin{align} \mu =& \operatorname{E}(X) = \frac{\alpha}{\alpha+\beta} \\ & \operatorname{E}(X- \mu) = & \frac{ \alpha (\alpha+1)}{(\alpha+\beta)(\alpha+\beta+1)} \\ \operatorname{Var}(X) = & \operatorname{E}(X- \mu)^2 = & \frac{\alpha \beta}{(\alpha+\beta)^2(\alpha+\beta+1)} \end{align} ???? {E}(X- \mu) =0$

The skewness is

$\operatorname{E}(X- \mu)^3 =\frac{2 (\beta - \alpha) \sqrt{\alpha + \beta + 1} } {(\alpha + \beta + 2) \sqrt{\alpha \beta}}. \,\!$

The kurtosis excess is:

$\operatorname{E}(X-\mu)^4 =\frac{\alpha^3-\alpha^2(2\beta-1)+\beta^2(\beta+1)-2\alpha\beta(\beta+2)} {\alpha \beta (\alpha+\beta+2) (\alpha+\beta+3)}.$

In general, the $k$ th raw moment is given by

$\operatorname{E}(X^k) = \frac{\operatorname{B}(\alpha+k, \beta)}{\operatorname{B}(\alpha,\beta)} = \frac{(\alpha)_{k}}{(\alpha + \beta)_{k}},$

where $(x)_{k}$ is a Pochhammer symbol representing rising factorial. It can also be written in a recursive form as

$\operatorname{E}(X^k) = \frac{\alpha+k-1}{\alpha+\beta+k-1}\operatorname{E}(X^{k-1}).$

One can also show that

$\operatorname{E}(\log X) = \psi(\alpha)-\psi(\alpha+\beta)$

Quantities of information

Given two beta distributed random variables, X ~ Beta(α, β) and Y ~ Beta(α', β'), the information entropy of X is ^[1]

$\begin{align} H(X) &= \ln\mathrm{B}(\alpha,\beta)-(\alpha-1)\psi(\alpha)-(\beta-1)\psi(\beta)+(\alpha+\beta-2)\psi(\alpha+\beta) \end{align} \,$

where $\psi$ is the digamma function.

The cross entropy is

$H(X,Y) = \ln\mathrm{B}(\alpha',\beta')-(\alpha'-1)\psi(\alpha)-(\beta'-1)\psi(\beta)+(\alpha'+\beta'-2)\psi(\alpha+\beta).\,$

It follows that the Kullback–Leibler divergence between these two beta distributions is

$D_{\mathrm{KL}}(X,Y) = \ln\frac{\mathrm{B}(\alpha',\beta')} {\mathrm{B}(\alpha,\beta)} - (\alpha'-\alpha)\psi(\alpha) - (\beta'-\beta)\psi(\beta) + (\alpha'-\alpha+\beta'-\beta)\psi(\alpha+\beta).$

Shapes

The beta density function can take on different shapes depending on the values of the two parameters:

$\alpha = 1,\ \beta = 1$ is the uniform [0,1] distribution
$\alpha < 1,\ \beta < 1$ is U-shaped (red plot)
or is strictly decreasing (blue plot)
- $\alpha = 1,\ \beta > 2$ is strictly convex
- $\alpha = 1,\ \beta = 2$ is a straight line
- $\alpha = 1,\ 1 < \beta < 2$ is strictly concave
or is strictly increasing (green plot)
- $\alpha > 2,\ \beta = 1$ is strictly convex
- $\alpha = 2,\ \beta = 1$ is a straight line
- $1 < \alpha < 2,\ \beta = 1$ is strictly concave
$\alpha > 1,\ \beta > 1$ is unimodal (purple & black plots)

Moreover, if $\alpha = \beta$ then the density function is symmetric about 1/2 (red & purple plots).

Parameter estimation

Let

$\bar{x} = \frac{1}{N}\sum_{i=1}^N x_i$

be the sample mean and

$v = \frac{1}{N}\sum_{i=1}^N (x_i - \bar{x})^2$

be the sample variance. The method-of-moments estimates of the parameters are

$\alpha = \bar{x} \left(\frac{\bar{x} (1 - \bar{x})}{v} - 1 \right),$

$\beta = (1-\bar{x}) \left(\frac{\bar{x} (1 - \bar{x})}{v} - 1 \right).$

When the distribution is required over an interval other than [0, 1], say $\scriptstyle [\ell,h]$ , then replace $\bar{x}$ with $\frac{\bar{x}-\ell}{h-\ell} ,$ and $\ v$ with $\frac{v}{(h-\ell)^2}$ in the above equations.^[2]^[3]

Related distributions

If X has a beta distribution, then T = X/(1 − X) has a "beta distribution of the second kind", also called the beta prime distribution.
The connection with the binomial distribution is mentioned below.
The Beta(1,1) distribution is identical to the standard uniform distribution.
If X has the Beta(3/2,3/2) distribution and R > 0 is a real parameter, then Y := 2RX – R has the Wigner semicircle distribution.
If X and Y are independently distributed Gamma(α, θ) and Gamma(β, θ) respectively, then X / (X + Y) is distributed Beta(α, β).
If X and Y are independently distributed Beta(α,β) and F(2β, 2α) (Snedecor's F distribution with 2β and 2α degrees of freedom), then Pr(X ≤ α/(α + xβ)) = Pr(Y > x) for all x > 0.
The beta distribution is a special case of the Dirichlet distribution for only two parameters.
The Kumaraswamy distribution resembles the beta distribution.
If $X \sim {\rm U}(0, 1]\,$ has a uniform distribution, then $X^2 \sim {\rm Beta}(1/2,1) \$ , which is a special case of the Beta distribution called the power-function distribution.
Binomial opinions in subjective logic are equivalent to Beta distributions.
Beta(1/2,1/2) is the Jeffreys prior for a proportion and is equivalent to arcsine distribution.

Beta(i, j) with integer values of i and j is the distribution of the i-th order statistic (the i-th smallest value) of a sample of i + j − 1 independent random variables uniformly distributed between 0 and 1. The cumulative probability from 0 to x is thus the probability that the i-th smallest value is less than x, in other words, it is the probability that at least i of the random variables are less than x, a probability given by summing over the binomial distribution with its p parameter set to x. This shows the intimate connection between the beta distribution and the binomial distribution.

Applications

Rule of succession

A classic application of the beta distribution is the rule of succession, introduced in the 18th century by Pierre-Simon Laplace in the course of treating the sunrise problem. It states that, given s successes in n conditionally independent Bernoulli trials with probability p, that p should be estimated as $\frac{s+1}{n+2}$ . This estimate may be regarded as the expected value of the posterior distribution over p, namely Beta(s + 1, n − s + 1), which is given by Bayes' rule if one assumes a uniform prior over p (i.e., Beta(1, 1)) and then observes that p generated s successes in n trials.

Bayesian statistics

Main article: Bayesian statistics

Beta distributions are used extensively in Bayesian statistics, since beta distributions provide a family of conjugate prior distributions for binomial (including Bernoulli) and geometric distributions. The Beta(0,0) distribution is an improper prior and sometimes used to represent ignorance of parameter values.

Task duration modeling

The beta distribution can be used to model events which are constrained to take place within an interval defined by a minimum and maximum value. For this reason, the beta distribution — along with the triangular distribution — is used extensively in PERT, critical path method (CPM) and other project management / control systems to describe the time to completion of a task. In project management, shorthand computations are widely used to estimate the mean and standard deviation of the beta distribution:

$\begin{align} \mu(X) & {} = \frac{a + 4b + c}{6} \\ \sigma(X) & {} = \frac{c-a}{6} \end{align}$

where a is the minimum, c is the maximum, and b is the most likely value.

Using this set of approximations is known as three-point estimation and are exact only for particular values of α and β, specifically when^[4]:

$\alpha = 3 - \sqrt2 \,$

$\beta = 3 + \sqrt2 \,$

or vice versa.

These are notably poor approximations for most other beta distributions exhibiting average errors of 40% in the mean and 549% in the variance^[5]^[6]^[7]

Information theory

We introduce one exemplary use of beta distribution in information theory, particularly for the information theoretic performance analysis for a communication system. In sensor array systems, the distribution of two vector production is used for the performance estimation in frequent. Assume that s and v are vectors the (M − 1)-dimensional nullspace of h with isotropic i.i.d. where s, v and h are in C^M and the elements of h are i.i.d complex Gaussian random values. Then, the production of s and v with absolute of the result |s^Hv| is beta(1, M − 2) distributed.

Four parameters

A beta distribution with the two shape parameters α and β is supported on the range [0,1]. It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum and maximum values of the distribution.^[8]

The probability density function of the four parameter beta distribution is given by

$f(y; \alpha, \beta, a, b) = \frac{1}{B(\alpha, \beta)} \frac{ (y-a)^{\alpha-1} (b-y)^{\beta-1} }{(b-a)^{\alpha+\beta-1}}$

The standard form can be obtained by letting $x = \frac{y-a}{b-a}$

References

↑ A. C. G. Verdugo Lazo and P. N. Rathie. "On the entropy of continuous probability distributions," IEEE Trans. Inf. Theory, IT-24:120–122,1978.
↑ Engineering Statistics Handbook
↑ Brighton Webs Ltd. Data & Analysis Services for Industry & Education
↑ Grubbs, Frank E. (1962). Attempts to Validate Certain PERT Statistics or ‘Picking on PERT’. Operations Research 10(6), p. 912–915.
↑ Keefer, Donald L. and Verdini, William A. (1993). Better Estimation of PERT Activity Time Parameters. Management Science 39(9), p. 1086–1091.
↑ Keefer, Donald L. and Bodily, Samuel E. (1983). Three-point Approximations for Continuous Random variables. Management Science 29(5), p. 595–609.
↑ DRMI Newsletter, Issue 12, April 8, 2005
↑ Beta4 distribution

External links

Weisstein, Eric W., "Beta Distribution" from MathWorld.
"Beta Distribution" by Fiona Maclachlan, the Wolfram Demonstrations Project, 2007.
Beta Distribution – Overview and Example, xycoon.com
Beta Distribution, brighton-webs.co.uk
Beta Distributions – Applet showing beta distributions in action.

Probability distributions

Discrete univariate with finite support

Benford · Bernoulli · Beta-binomial · binomial · categorical · hypergeometric · Poisson binomial · Rademacher · discrete uniform · Zipf · Zipf-Mandelbrot

Discrete univariate with infinite support

Boltzmann · Conway–Maxwell–Poisson · discrete phase-type · extended negative binomial · Gauss–Kuzmin · geometric · logarithmic · negative binomial · parabolic fractal · Poisson · Skellam · Yule–Simon · zeta

Continuous univariate supported on a bounded interval, e.g. [0,1]

Beta · Irwin–Hall · Kumaraswamy · logit-normal · raised cosine · triangular · U-quadratic · uniform · Wigner semicircle

Continuous univariate supported on a semi-infinite interval, usually [0,∞)

Beta prime · Bose–Einstein · Burr · chi-square · chi · Coxian · Erlang · exponential · F · Fermi–Dirac · folded normal · Fréchet · Gamma · generalized extreme value · generalized inverse Gaussian · half-logistic · half-normal · Hotelling's T-square · hyper-exponential · hypoexponential · inverse chi-square (scaled inverse chi-square) · inverse Gaussian · inverse gamma · Lévy · log-normal · log-logistic · Maxwell–Boltzmann · Maxwell speed · Nakagami · noncentral chi-square · Pareto · phase-type · Rayleigh · relativistic Breit–Wigner · Rice · Rosin–Rammler · shifted Gompertz · truncated normal · type-2 Gumbel · Weibull · Wilks' lambda

Continuous univariate supported on the whole real line (−∞, ∞)

Cauchy · extreme value · exponential power · Fisher's z · generalized normal · generalized hyperbolic · Gumbel · hyperbolic secant · Landau · Laplace · logistic · noncentral t · normal (Gaussian) · normal-inverse Gaussian · skew normal · slash · stable · Student's t · type-1 Gumbel · Variance-Gamma · Voigt

Multivariate (joint)

Discrete: Ewens · multinomial · multivariate Polya · negative multinomial Continuous: Dirichlet · Generalized Dirichlet · multivariate normal · multivariate Student · normal-scaled inverse gamma · normal-gamma Matrix-valued: inverse-Wishart · matrix normal · Wishart

Directional, degenerate, and singular

Directional:Circular Uniform · bivariate von Mises · Kent · univariate von Mises · von Mises–Fisher · Wrapped normal · Wrapped Cauchy · Wrapped Lévy Degenerate: discrete degenerate · Dirac delta function Singular: Cantor

Families

Circular · compound Poisson · elliptical · exponential · natural exponential · location-scale · maximum entropy · mixture · Pearson · Tweedie

Some common univariate probability distributions

Continuous	beta • Cauchy • chi-square • exponential • F • gamma • Laplace • log-normal • normal • Pareto • Student's t • uniform • Weibull

Discrete	Bernoulli • binomial • discrete uniform • geometric • hypergeometric • negative binomial • Poisson

List of probability distributions